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Abstract 


Structural and physiological changes in the female reproductive system underlie the origins of 
pregnancy in multiple vertebrate lineages. In mammals, the glandular portion of the lower 
reproductive tract has transformed into a structure specialized for supporting fetal development. 
These specializations range from relatively simple maternal nutrient provisioning in egg-laying 
monotremes to an elaborate suite of traits that support intimate maternal-fetal interactions in 
Eutherians. Among these traits are the maternal decidua and fetal component of the placenta, 
but there is considerable uncertainty about how these structures evolved. Previously we showed 
that changes in uterine gene expression contributes to several evolutionary innovations during 
the origins of pregnancy (Marinic, Mika, and Lynch 2021). Here we reconstruct the evolution of 
entire transcriptomes ("ancestral transcriptome reconstruction") and show that maternal gene 
expression profiles are correlated with degree of placental invasion. These results indicate that 
an epitheliochorial-like placenta evolved early in the mammalian stem-lineage and that the 
ancestor of Eutherians had a hemochorial placenta, and suggest maternal control of placental 
invasiveness. These data resolve major transitions in the evolution of pregnancy and indicate 
that ancestral transcriptome reconstruction can be used to study the function of ancestral cell, 


tissue, and organ systems. 
Introduction 


Studies of the fossil record have revealed in fine detail major stages in the origin and 
diversification of evolutionary novelties (structures) and innovations (functions) such as the 
vertebrate limb and skull (Abzhanov, 2015; Hirasawa and Kuratani, 2015), the turtle shell (Lyson 
and Bever, 2020), feathers (Chen et al., 2015), and flowers (Chanderbali et al., 2016). Many 
features of soft tissues and macromolecular structures, however, are lost during the fossilization 
process and thus leave little to no trace in the fossil record. To reconstruct the history of these 
characters, including DNA and amino acid sequences, morphology, physiology, and behavior, 
among others, evolutionary studies have traditionally relied on comparative (phylogenetic) 
methods such as parsimony or model-based maximum likelihood or Bayesian inference (Lewis 
and Olmstead, 2001). These methods infer ancestral characters from the distribution of 
character states among extant species, the latter of which also allows increased model 
complexity including unequal character state transitions (Lewis and Olmstead, 2001; Pagel, 
1999), variable rates among sites and branches (Galtier, 2001; Yang, 1994, 1993), and 


character dependent diversification (Maddison et al., 2007). 
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Extant mammals span crucial transitions in the origins of pregnancy (Figure 1A) and are 
an excellent system in which to explore the origins of evolutionary novelties. The platypus and 
echidna (monotremes), for example, are oviparous (egg-laying) but retain the egg in the 
glandular portion of uterus for about two weeks. During this period the developing embryo is 
nourished by uterine secretions (matrotrophy) delivered through a simple yolk sac placenta 
(Hughes and Hall, 1998; Renfree and Shaw, 2013). Viviparity (live-birth) evolved in the stem- 
lineage of therian mammals, but marsupials and eutherians have very different reproductive 
strategies, particularly in the ontogenetic origins of the definitive placenta and the arrangement 
of the maternal-fetal interface (Freyer and Renfree, 2009; Renfree and Shaw, 2013; Renfree, 
2010). In most marsupials, the embryonic portion of the placenta is derived from the yolk sac, 
which may come into direct contact with but does not invade the maternal endometrium. While 
the yolk sac has essential functions during early pregnancy in eutherians, the definitive placenta 
is derived from chorion and allantois (chorioallantois) and varies in its degree of endometrial 
invasion (Swanson and Skinner, 2018)(Figure 1B). Thus, matrotrophy and a yolk-sac derived 
placenta were present in the mammalian stem-lineage and transitioned to a chorioallantoic 


placenta in eutherians. 


Unfortunately, most characters related to pregnancy and the nature of the maternal-fetal 
interface leave little to no trace in the fossil record. Thus, studies exploring the evolution of 
pregnancy have relied on comparing morphological differences between extant mammals to 
reconstruct the steps in the origins of pregnancy. These comparative analyses have used 
multiple methods to reconstruct the arrangement of the mammalian maternal-fetal interface and 
reached contradictory conclusions about the degree of placental invasiveness in the eutherian 
ancestor, a debate which has persisted since at least 1876 (Table 1). Here, we use comparative 
transcriptomics and maximum likelihood to infer ancestral gene expression states (ancestral 
transcriptome reconstruction) from 23 amniotes with different parity modes and degrees of 
placental invasion to identify the evolutionary history of the maternal-fetal interface. We found 
strong evidence that the last common ancestor of eutherian mammals had an invasive 
hemochorial placenta, as well as convergence in gene expression profiles during the 
independent evolution of non-invasive epitheliochorial placentas in marsupials and some 
eutherian lineages. These data indicate that the degree of placental invasion can be inferred 
from endometrial gene expression profiles, and suggest that placental invasiveness is regulated 
by gene expression profiles in the maternal endometrium rather than the fetal portion of the 


placenta. 
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Table 1. Inferences of placental invasiveness in the eutherian ancestor 1880-2021. Note 
that we follow Mossman (Mossman, 1937, p. 156) and define a placenta as “any intimate 


apposition or fusion of the fetal organs to the maternal (or paternal) tissues for physiological 


exchange.” 


Study 

(Turner, 1876) 
(Haeckel 1883) 
(Minot, 1891) 
(Wislocki, 1929) 
(Hill, 1932) 
(Mossman, 1937) 
(Portmann, 1938) 
(Martin, 1969) 
(Kihlstrom, 1972) 


(Luckett, 1976, 1975, 
1974) 


(Carter and Enders, 2004) 


(Vogel, 2005) 
(Elliot and Crespi, 2006) 
(Mess and Carter, 2006) 


(Wildman et al., 2006) 


(Martin, 2008) 


(Elliot and Crespi, 2009) 


(McGowen et al., 2014) 
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Ontogeny 

Implicit Parsimony? 
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Ontogeny 

Ontogeny / Implicit Parsimony 
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type Correlation 
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Implicit Parsimony 

Maximum Likelihood 
Maximum Parsimony 


Maximum Parsimony & 
Maximum Likelihood 


Parsimony 


Maximum Parsimony & 
Maximum Likelihood 


Maximum Likelihood 


State 
Epitheliochorial 
“Non-Invasive” 
Hemochorial 
Hemochorial 
Epitheliochorial 
Endotheliochorial 
Hemochorial 
Endotheliochorial 


(Endothelio/nemo)chorial 


Epitheliochorial 


Endotheliochorial 
Hemochorial 
Hemochorial 
(Endothelio/hemo)chorial 
Hemochorial 
Endotheliochorial 
(ordered) or 
(Endothelio/nemo)chorial 
(unordered) 
(Endothelio/hemo)chorial 
(MP) or Hemochorial 
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(Superficial) 
Hemochorial 


96 


97 


98 

99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 


112 


113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 


Results 
Endometrial Gene Expression Profiling 


We previously assembled a collection of transcriptomes from the pregnant or gravid 
uterine endometrium of therian mammals with varying degrees of placental invasiveness, as 
well as a monotreme (platypus), a bird, as well as viviparous, oviparous, and reproductively bi- 
modal lizards (Marinié et al., 2021). The complete dataset includes expression information for 
21,750 genes from 23 species (Figure 2—source data 1). Principal Component Analysis (PCA) 
based on gene expression levels in transcripts per million (TPMs), using a variant of PCA that 
extends PCA to binary data, generally grouped species randomly (Figure 2A), consistent with 
noise in gene expression data overwhelming phylogenetic signal. Therefore, we transformed 
quantitative gene expression values in transcripts per million (TPM) into discrete character 
states — Genes with TPM 2 2.0 were coded as expressed (state=1), genes with TPM < 2.0 were 
coded as not expressed (state=0), and genes without data in specific species coded as missing 
(?) (Marini¢ et al., 2021; Mika et al., 2021). In contrast to PCA based on TPM values, PCA of the 
binary encoded endometrial transcriptome dataset grouped species by phylogenetic relatedness 


(Figure 2B), indicating significant noise reduction in the binary encoded dataset. 
Phylogenetic analyses of endometrial transcriptomes 


We used IQ-TREE to infer the best fitting model of character evolution and the 
maximum-likelinood (ML) phylogeny. The best-fit model of character evolution was a general 
time reversible model for binary data (GTR2) with character state frequencies optimized by 
maximum-likelinood (FO) and rate heterogeneity across sites accommodated with the FreeRate 
model that had three rate categories (R3). Next, we assessed the tree topology and branch 
support metrics for the ML phylogeny inferred using the binary encoded endometrial gene 
expression dataset and the GTR2+FO+R3 model. The ML phylogeny (Figure 3A) generally 
followed taxonomic relationships (Figure 3B), however, four particularly discordant relationships 
within therian mammals were inferred with high support: 1) Rather than grouping marsupials 
into a monophyletic sister-clade to the eutherians, opossum and wallaby were placed as sister- 
species within the Boreoeutheria; 2) Armadillo groups within the Euarchontoglires rather than as 
sister to all other eutherians; 3) Bat groups within the Euarchontoglires rather than within 
Laurasiatheria; and 4) Dog groups within the Euarchontoglires rather than with Laurasiatheria 
(Figure 3A/B). We also used multiple non-parametric topology tests to directly compare the ML 


tree to alternative trees with the correct phylogenetic placement of these four discordant 
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lineages, all of which rejected alternative “corrected” trees in favor of the (Figure 3C). 
Remarkably while these discordant relationships are incorrect with respect to the species 
phylogeny, they are correct with respect to placenta-type, i.e., species form well-supported 
clades based on their degree of placental invasiveness: Wallaby and opossum, which have 
epitheliochorial placentas similar to ungulates forms a clade with ungulates, Armadillo, which 
has an invasive placenta (Carter and Enders, 2004; Chavan and Wagner, 2016), forms a clade 
with the other species with hemochorial placentas, and dog, which has an _ invasive 
endotheliochorial placenta, forms a clade with Euarchontoglires that have invasive hemochorial 


placentas. 
Ancestral transcriptome reconstruction and fuzzy C-Means clustering 


We also used IQ-TREE and the species phylogeny (Figure 3B) to reconstruct ancestral 
gene expression states for each gene (ancestral transcriptome reconstruction). To explore the 
similarity of extant and reconstructed transcriptomes we used Fuzzy C-Means (FCM) clustering, 
a “soft” clustering method that allows each sample to have membership in multiple clusters and 
assigns samples to clusters based on their degree of cluster membership. FCM with two to four 
clusters (K=2-4) had a clear biological interpretation (Figure 4): 1) K=2 clustered eutherians and 
non-eutherians; 2) K=3 clustered most therians with non-invasive (epitheliochorial) placentas, 
eutherians with invasive (endotheliochorial or hemochorial) placentas, platypus and sauropsids 
(i.e. viviparous and oviparous lizards, and birds); 3) K=4 clustered eutherians with invasive 
placentas, eutherians with epitheliochorial placentas, opossum/wallaby, and sauropsids. A 
notable exception with K=3-4 is the cluster membership of dunnart, which is discussed in 
greater detail below. FCM clusters with K=5 and K=6 were similar to K=4, but divided 
Eutherians with hemochorial placentas into two clusters, and clustered dog, dunnart, and the 
viviparous skink Chalcides ocellatus (Figure 4); Beyond K=6 clusters had no clear biological 


interpretation. 


Ancestral transcriptome reconstructions generally clustered with extant species having 
similar parity modes and degrees of placenta invasiveness (Figure 4). For example, FCM with 
K=2 grouped extant eutherians and their ancestral lineages as well as extant non-eutherians 
and their ancestral lineages. Similarly, FCM with K=4 grouped extant eutherians with invasive 
placentas and their ancestral lineages with extant therians with non-invasive placentas and their 
ancestral lineages. FCM with K=3-5 clustered the ancestral eutherian (AncEutheria) 


transcriptome with extant species that have invasive hemochorial placentas and clustered the 
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ancestral therian (AncTheria) and mammalian (AncMammalia) transcriptomes with extant 
mammals that have non-invasive epitheliochorial placentas. These data suggest that FCM 
clustering of extant and ancestral reconstructions of endometrial transcriptomes can predict 
ancestral placenta invasiveness, implying that an epitheliochorial placenta evolved early in the 
development of mammalian pregnancy and that a hemochorial placenta is ancestral for 


eutherians. 


While FCM clustering generally groups extant and ancestral transcriptomes by 
phylogenetic relatedness and degree of placental invasiveness, a notable exception is the 
marsupial fat-tailed dunnart (Sminthopsis crassicaudata). FCM clusters with K=2-5 grouped 
dunnart with non-therians while FCM K=6 clustered dog, dunnart, and the skink, C. ocellatus 
(Figure 4). FCM cluster membership coefficients of all three species with K=2—5 were mixed, 
with significant membership across clusters. In contrast, dog, dunnart, and C. ocellatus formed 
a distinct cluster from all other species at K=6 with nearly 100 % FCM cluster membership in 
group 6 (Figure 4). Remarkably, both dog and dunnart have endotheliochorial placentas 
whereas the Chalcides maternal-fetal interface has been described as either endotheliochorial 
or epitheliochorial with extensive vascularization and interdigitating folds of hypertrophied 
uterine and chorioallantoic tissue (Blackburn, 1993; Blackburn and Callard, 1997; Corso et al., 
2000). These data suggest that species with endotheliochorial placentas have a gene 
expression profile that is intermediate between epitheliochorial or hemochorial placentas, yet is 
also distinct, and that gene expression at the Chalcides maternal-fetal interface are converging 


on a therian-like endotheliochorial pattern. 
Convergent loss of RORA in species with epitheliochorial placentas 


Among the genes with discordant species- and gene-expression phylogenies is RAR- 
related orphan receptor alpha (RORA), an orphan nuclear hormone receptor that regulates the 
development and function of type 2 and type 3 innate lymphoid cells (ILC2 and ILC3), which 
negatively regulates the immune system in local microenvironments especially during 
inflammation (Haim-Vilmovsky et al., 2020), the establishment of tolerance to intestinal 
microbiota (Lyu et al., 2022), and the regulation of inflammatory responses and vascular 
remodeling during placentation and pregnancy (Balmas et al., 2018; Mendes et al., 2020; Miller 
et al., 2018). Remarkably RORA independently lost endometrial expression at least four times, 
each loss coincident with the independent evolution of non-invasive epitheliochorial placentas 


(Figure 5A). The restricted expression of RORA to immune cells in the human first trimester 
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decidua (Vento-Tormo et al., 2018) (Figure 4B) suggests that the repeated loss of RORA 
expression reflects changes in the composition of immune cells at the maternal-fetal interface in 
species with epitheliochorial placentas. To explore this possibility, we used CIBERSORT to 
deconvolve endometrial bulk RNA-Seq datasets from each species into cell-type abundance 
estimates with a signature gene expression matrix composed of cell-types from the human first 
trimester maternal-fetal interface. We found that CIBERSORT inferred that nearly all species 
that lacked RORA expression also lacked ILC2/3 cells at the maternal-fetal interface with the 
exception of armadillo and chicken which had RORA expression but were inferred to lack 
endometrial ILC2/3 cells, and the viviparous Saiphos population which has an epitheliochorial 
placenta and RORA expression and was inferred to have a population of ILC2/3 cells. Thus, we 
conclude that there is convergent loss of RORA in species with epitheliochorial placentas which 
in many species correlates with the convergent loss of ILC2/3 cells at the maternal-fetal 


interface. 
Discussion 


One of the central of goals of DevoEvo is a mechanistic explanation for the origin and 
evolution of evolutionary novelties (Amundson, 2005; Brigandt and Love, 2010; Lynch, 2022; 
Wagner, 2001, 2000; Wagner and Larsson, 2003). A major challenge for reconstructing the 
origins of evolutionary novelties, however, is a lack of transitional forms among living species. 
For example, while feathers and the turtle’s shell are excellent examples of morphological 
novelties, there are no living species with transitional forms between scale and feather (Chen et 
al., 2015) or with protoshells (Lyson and Bever, 2020). Despite the lack of transitional forms 
among extant taxa, an abundance of fossil data has reconstructed the major steps in the 
evolution of these structures that when combined with molecular studies of their development 
provides a rich explanation for their developmental evolution. Unfortunately, many features of 
soft tissues leave little to no trace in the fossil record, thus the steps in their evolution have 
remained elusive. Here we used ancestral transcriptome reconstruction to trace gene 
expression changes during the origins of mammalian pregnancy, in which extant species 
preserve intermediate stages in the evolution of both pregnancy and a diversity of placenta 


types and reconstruct the evolution of placental invasiveness of ancestral species. 
Binary encoding uncovers hidden biological signal 


While transforming gene expression count data into binary categories likely loses 


information about evolutionarily relevant variation in expression levels, it may not be possible to 
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infer meaningful gene expression levels from bulk RNA-Seq. For example rather than 
evolutionary differences between individuals or species, variation in transcript abundance 
between samples can result from various sources of experimental noise such as technical 
variation in library preparation, sequencing, or batch effects (Gilad and Mizrahi-Man, 2015; Tung 
et al., 2017), variation in cell-type composition of a tissue (Price et al., 2022), sampling different 
timepoints in development or only a few individuals that do not capture the variance properties 
of gene expression levels within a population or species (Pal et al., 2021; Thompson et al., 
2020). Thus, by transforming gene expression data into not/expressed states we may reduce 
the potential for these and other biases to influence our ancestral transcriptome reconstructions, 
revealing biological signal. Our binary encoded endometrial gene expression dataset, for 
example, cluster species with similar placenta types and parity mode (Figure 2B) indicating 
binary encoding preserves functional signal in gene expression data that is otherwise masked 


by variation in gene expression levels (Figure 2A). 
Phylogenetic analyses of endometrial transcriptomes 


Previous studies of molecular phylogenies have shown that gene-species tree 
discordance can result from convergent or parallel amino acid changes in lineages that 
independently evolved morphological traits. For example, parallel amino acid changes in prestin 
(SLC26A5), a motor protein expressed in outer hair cells in the cochlea that essential for 
hearing, have occurred in different lineages of echolocating mammals, including multiple 
lineages bats and whales, leading a strongly monophyletic clades in gene trees that do not 
reflect taxonomic relationships (Li et al., 2008, 2010; Liu et al., 2010; Teeling, 2009). Gene- 
species tree discordance is also associated coadaptation of amino acid substitutions in Na*K’— 
ATPase of frogs that prey on toxic toads (Mohammadi et al., 2021), the evolution of Cy, 
phosphoenolpyruvate carboxykinases (PCK) in grasses (Christin et al., 2009), and between 
snake and agamid lizard mitochondrial genomes (Castoe et al., 2009). These examples 
highlight how phylogenetic discordance can be a signal of convergent evolution, rather than just 
a sign of biased phylogenetic inference that can arise from processes like long branch attraction 
(Bergsten, 2005; Felsenstein, 1978), incomplete lineage sorting and introgression (Guerrero 
and Hahn, 2018; Hibbins et al., 2020; Maddison and Knowles, 2006), biased character state 
frequencies and gene conversion (Figuet et al., 2014; Kostka et al., 2012; Lartillot, 2013; 
Romiguier et al., 2013), and heterotachy (Kolaczkowski and Thornton, 2008; Philippe et al., 
2005), among others. 
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Similar to the signals of convergence from gene-species tree discordance, our 
transcriptomic data showed significant phylogenetic support for uterine transcriptomes grouping 
by parity mode and degree of placental invasiveness rather than phylogenetic relationships. 
This transcriptome-species tree discordance is particularly striking for opossum and wallaby, 
which are deeply nested within eutherians with epitheliochorial placentas based on uterine 
transcriptome data, because the eutherian and marsupial placenta is derived from different 
extra-embryonic tissues — the chorion and allantois in eutherians and the yolk-sac in marsupials. 
Thus, there must be significant convergence in endometrial gene expression profiles between 
eutherians with chorioallantois-derived epitheliochorial placentas and marsupials with yolk-sac 
epitheliochorial placentas because our transcriptome phylogeny is based on gene expression in 
the endometrium during pregnancy rather than gene expression in the placenta. This 
convergence in gene expression between species with non-invasive epitheliochorial placentas 
may be related to the expression of genes that limit the ability of the trophoblast to invade into 


maternal tissues (see below). 
Reconstruction of ancestral (endometrial) transcriptomes 


Previous studies have explored transcriptome evolution with the goal of developing 
methods of ancestral transcriptome inference (Price et al., 2022), characterizing the general 
tempo and mode of gene expression evolution (Bauernfeind et al., 2021), implicating 
transcriptome evolution in the origin and evolution of morphological traits (Church et al., 2022; 
Katelyn et al., 2021; Lynch et al., 2015; Marinié et al., 2021; Munro et al., 2021), and identifying 
specific genes with derived expression levels (Brawand et al., 2011; Necsulea et al., 2014), 
usually in the context of characterizing genes whose expression levels have changed because 
of the action of positive selection, i.e., directional selection on transcript abundance (Gu, 2004; 
Price et al., 2022; Yang et al., 2020). In contrast, only a few studies have treated the expression 
of individual genes as a discrete character, transforming quantitative gene expression levels into 
binary not/expressed states, and used ancestral state reconstruction to determine the 
expression state of all genes in an ancestral transcriptome rather than changes in the 
expression levels of specific genes (Katelyn et al., 2021; Kin et al., 2015; Lynch et al., 2015; 
Marinié et al., 2021). Our ancestral transcriptome reconstructions of gene expression state 
indicate that an epitheliochorial-like placenta evolved early in the mammalian stem-lineage, 
before the loss of the egg-shell in Therians, and that the ancestor of Eutherians had a 
hemochorial placenta, which resolves a longstanding debate about the nature of ancestral 


mammalian placentas. These data suggest that ancestral transcriptome reconstruction can be 
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used to infer the function of ancestral cell, tissue, and organ systems which leave little to no 


trace in the fossil record even if soft tissues might by chance fossilize. 
Convergent loss of RORA in species with epitheliochorial placentas 


Our observation that RORA expression has been lost multiple times in species with non- 
invasive epitheliochorial placentas, which is coincident with the inferred absence of ILC2/ILC3 
cells from the endometrium during at least one stage in pregnancy. This result suggests a 
mechanistic connection between absence of decidual RORA expression, the absence of 
decidual ILC2/ILC3 cells, and the loss of placental invasion. Indeed, the major decidual ILC 
produce signaling factors such as GM-CSF, XCL1, MIP1a, and MIP1B, whose receptors are 
expressed by EVT likely regulate placental invasion (Huhn et al., 2020). ILC2 also contributes to 
the maintenance of a type-2 anti-inflammatory immune environment in the uterus during 
pregnancy (Balmas et al., 2018), which may be particularly important in species in which the 
endotheliochorial or haemochorial placenta invades maternal tissues. ILC2s, for example, are 
increased in the decidua basalis of women with spontaneous preterm labor compared to those 
who delivered preterm without labor (Mendes et al., 2021; Xu et al., 2018), suggesting that ILCs 
may participate in the chronic inflammatory process that occurs during pregnancy. Thus, loss of 
placental invasion may be associated with a reduced need to limit local inflammation by ILCs, 
leading to an evolutionary loss of ILCs in the endometrium during pregnancy. However, while 
our data are consistent this role for decidual ILC2 and its association with placental invasion, 
more detailed studies specifically aimed at characterizing cell-type composition differences 
across species are necessary to determine if there are correlations between cell-types in the 


endometrium during pregnancy and placenta type. 
Ideas and Speculation: Maternal control of placental invasion 


Our observation that endometrial gene expression patterns are correlated with degree of 
placental invasiveness might seem surprising, however, rather than acting as a passive 
substrate into which the trophoblast invades, the endometrium directly controls trophoblast 
invasion (Cui et al., 2012; Graham and Lala, 1991). For example, the trophoblast of mammals 
with hemochorial placentas, such as humans and rodents, is only permissive to invasion when 
the “window of implantation” is opened by the endometrium. Similarly, while the trophoblast of 
mammals with non-invasive endotheliochorial and epitheliochorial placentas, such as cats, 
dogs, horses, cows, pigs, and sheep cannot invade into the endometrium, they can invade into 


ectopic sites (reviewed in (Corpa, 2006)). The trophoblasts of guinea pig (Loeb, 1914), mouse 
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(Billington, 1965), rat (Jollie, 1961), and pig (Samuel and Perry, 1972) also invade ectopic sites 
in experimentally induced ectopic pregnancy. While there is no similar ectopic pregnancy data 
for marsupials, some marsupial lineages, including dunnart, have evolved invasive placentation. 
In contrast, there is no invasion of maternal tissues during ectopic pregnancy in viviparous 
reptiles with epitheliochorial placentation such as Pseudemoia entrecasteauxii (Griffith et al., 
2013). Thus, the invasive ability of trophoblasts most likely either evolved in the stem-lineage of 
therian mammals or multiple times, including in the stem-lineage of eutherians and some 
lineages of marsupials. Regardless, ancestral maternal control of placental invasion likely 


allowed us to infer ancestral placental invasiveness from ancestral endometrial transcripomes. 
Ideas and Speculation: Evolution of placental invasion and cancer metastasis 


Numerous authors have noted the similarity between placental invasion and cancer 
metastasis (Costanzo et al., 2018; Ferretti et al., 2006; Kozlov, 2022; Lala et al., 2021; Manzo, 
2019; Murray and Lessey, 1999; Perry et al., 2009; Piechowski, 2019), which was first proposed 
in 1902 by Scottish embryologist John Beard (1858-1924) who hypothesized that ectopic 
trophoblasts gave rise cancer (Gurchot, 1975; Ross, 2014). While Beard’s hypothesis is 
incorrect, at least for cancers not derived from the placenta such as choriocarcinoma, there are 
numerous mechanistic similarities between implantation, placental invasion, and tumor 
progression to malignancy (Nordor et al., 2017; Wagner et al., 2021). These data suggest that 
the evolution of maternal mechanisms that prevent endometrial invasion through expression 
gain and loss of genes that restrain and promote, respectively, trophoblast invasion, may be 
related to resistance to metastasis in eutherian lineages with non-invasive placentas (Boddy et 
al., 2020; Kshitiz et al., 2019; Wagner et al., 2020). For example, pleiotropy can lead to 
correlated patterns of gene expression between the transcriptomes of different tissue and organ 
systems (Liang et al., 2018). Thus, the evolution of a gene regulatory module that restricts 
placental invasion into the endometrium can be coopted to restrict implantation and spread of 


cancer cells into metastatic locations. 


Caveats and limitations 


A limitation of this study not directly addressed thus far is that we have only sampled a 


small number of species. For example, we lack pregnant endometrial samples from most 
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mammals, particularly those with endotheliochorial placentas, as well as a diversity of oviparous 
and viviparous squamates (there are at least 115 origins of viviparity in squamates (Blackburn, 
2015; Blackburn and Brandley, 2015; Blackburn and Starck, 2015). Thus, our inferences from 
phylogenetic, ancestral reconstruction, and clustering analyses may be biased by small sample 
sizes and non-random sampling. We also assume that models of evolution designed for 
phylogenetic inference and ancestral reconstruction of morphological and molecular data are 
appropriate for gene expression data or binary encoded gene expression data, which may affect 
our results; for example, we have not directly accommodated incomplete lineage sorting which 
can mislead phylogenetic inference (Guerrero and Hahn, 2018; Hibbins et al., 2020). Similarly, 
while Fuzzy C-Means clustering is conceptually similar to topic ("grade of membership") models 
used in population genetics, its underlying assumptions may be violated for gene expression 
and binary encoded gene expression data. More detailed studies are necessary to determine if 
our results are robust to potential sources of error such as model mis-specification, small 
sample sizes, and non-random taxon sampling, as well as incomplete lineage sorting, the latter 


of which we were unable to directly test. 
Conclusions 


Previous critiques of statistical methods to infer ancestral states, particularly in the 
context of parity mode evolution in squamates, have suggested that ancestral state 
reconstructions of morphological characters must be supported by additional kinds of biological 
support such as anatomical, physiological, and ecological evidence, to be persuasive (Griffith et 
al., 2015). Here we explored the evolution of parity mode and placental invasiveness in 
amniotes utilizing comparative gene expression data. While our study also relies on statistical 
methods to infer ancestral (gene expression) states, this approach is orthogonal to traditional 
methods that infer ancestral states from morphological characters among extant species. 
Indeed, gene expression ultimately underlies the development, evolution, and function of 
anatomical systems. Thus, by reconstructing the evolution of entire transcriptomes we may be 
able to infer function of ancestral cell, tissue, and organ systems. Our results resolve several 
evolutionary transformations during the origins of pregnancy, including the early evolution of an 
epitheliochorial-like placenta in the mammalian stem-lineage, a hemochorial placenta in the 
ancestor of eutherians, multiple reversions to non-invasive epitheliochorial placentas within 
some eutherian lineages, convergent evolution of gene expression profiles among species with 
different ontogenetic origins of epitheliochorial placentas, and maternal control of placental 


invasiveness. 
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Materials and Methods 
Endometrial Gene Expression Profiling 


We previously published a dataset of uterine endometrial transcriptomes, which is also 
used in this study. Interested readers are referred to Mika, Marinic, and Lynch (2021) for 
specific details. Briefly, we searched the NCBI BioSample, Short Read Archive (SRA), and 


Gene Expression Omnibus (GEO) databases using the search terms “uterus”, “endometrium”, 
“decidua”, “oviduct”, and “shell gland”. These anatomical terms refer to the glandular portion of 
the female reproductive tract, which is specialized for maternal-fetal interactions or shell 
formation. We then manually curated transcriptomes and excluded those that did not indicate 
whether tissue samples were from pregnant or gravid tissues and datasets composed of pooled 
tissues. Gene expression data were analyzed with Kallisto (Bray et al., 2016) version 0.42.4 to 
pseudo-align the raw RNA-Seq reads to reference transcriptomes and to generate transcript 
abundance estimates (see Figure 2-source data 1 for accession numbers and reference 
genome assemblies); Kallisto was run using default parameters, bias correction, and 100 


bootstrap replicates. 
Gene expression phylogeny and ancestral transcriptome reconstruction 


We used the binary encoded endometrial transcriptome dataset for phylogenetic 
analyses and to reconstruct ancestral gene expression states. Gene expression phylogenies 
were inferred with IQ-TREE2 (Nguyen et al., 2015) using the best-fitting model of character 
evolution determined by ModelFinder (Kalyaanamoorthy et al., 2017). The best fitting model 
was inferred to be the General Time Reversible model for binary data (GTR2), with character 
state frequencies optimized by maximum-likelinood (FO), and a FreeRate model of among site 
rate heterogeneity with three categories (R3) (Soubrier et al., 2012). The rate at which 
characters evolve may vary over time, with the same character evolving rapidly or slowly in 
different lineages. This phenomenon, known as heterotachy, can bias phylogenetic trees using 
models of evolution that assume the rates of character evolution are constant such as the 
GTR+FO+R3 model. Therefore we compared the GTR2+FO+R3 model to the General 
Heterogeneous evolution On a Single Topology (GHOST) model; The GHOST model 
accommodates heterotachy by combining features of mixed substitution rate models (Foster 
2004; Lartillot and Philippe 2004; Pagel and Meade 2004), whereby each class of characters 
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has its own substitution rate matrix, and mixed branch length models (Kolaczkowski and 
Thornton 2004; Meade and Pagel 2008), whereby each class of characters has its own set of 


branch lengths. 


We found the best-fit GHOST model was GTR2+FO*H4 (AlCc = 215681.71), which 
accommodated rate heterotachy with 4 categories, described the binary encoded gene 
expression dataset better than the standard GTR2+FO+R3 model (AlCc = 220930.65) 
indicating that there is extensive variation in the rate of character evolution over time. However, 
while there was a significant likelihood difference between GTR2+FO+R3 and GTR2+FO*H4 
(AlCc difference = 5248.94), as well as other heterotachy with models with less than or more 
than 4 categories, the topology of the trees was the same. Thus, while accommodating 
heterotachy improves estimation of parameters of the substitution model it had no affect on the 
tree topology and we therefore use the computationally simpler GTR2+FO+R3 model for 
downstream analyses; This is particularly important for non-parametric topology tests and 
ancestral state reconstructions (discussed below), which as currently implemented in l|Q-TREE2 


cannot accommodate heterotachy models with unlinked model parameters. 


Ancestral gene expression states for each gene were inferred using the empirical 
Bayesian method implemented in IQ-TREE2, the GTR2+FO+R3 model of character evolution, 
and the species phylogeny as a constraint tree (Figure 3B). Branch support was assessed 
using the standard (StdBoot) and ultrafast (UFBoot) bootstraps, which assess the effects of 
sampling bias on branch support (Hoang et al., 2018; Minh et al., 2020). We also used several 
single branch tests, including the SH-like aLRT and the parametric aLRT (Anisimova et al., 
2006; Guindon et al., 2010), aBayes (Anisimova et al., 2011), and the local (LBoot) bootstrap 
tests (Minh et al., 2020); single branch tests assess whether a branch provides a significant 
likelihood improvement compared to a null hypothesis that collapses the branch to a polytomy 
but leaves the rest of the tree topology unaltered. We considered a clade to be highly-supported 
if its StdBoot support = 80%, UFboot 2 95%, SH-aLRT 2 80%, aBayes 2 0.90, parametric aLRT 
2 0.95, and LBoot 2 90% (Anisimova et al., 2011). 


The bootstrap and single branch tests assess the robustness of individual branch 
bipartitions and cannot directly compare complex alternate tree topologies. Therefore we used 
non-parametric topology tests to directly compare the inferred ML tree to alternative trees with 
the correct phylogenetic placement of platypus, armadillo, dog, marsupials (opossum and 


wallaby), and bat, as well as the correct species phylogeny (Figure 2B); tests included the BP- 
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RELL, KH-test (Kishino et al., 1990; Kishino and Hasegawa, 1989), SH-test (Anisimova et al., 
2011; Guindon et al., 2010), c-ELW (Strimmer and Rambaut, 2002), weighted KH- and SH- 
tests, and the AU-test (Shimodaira, 2002). We note that the KH-test compares two a priori 
defined trees rather than the ML and alternative trees (Goldman et al., 2000) and does not 
correct for multiple hypothesis tests, it is included solely for comparison to other methods and 
previous studies. The SH-test can be used to compare the ML tree to multiple alternative trees 
selected a priori (i.e., is dataset independent) and corrects for multiple hypothesis tests, but is 
too conservative when many trees are tested. The AU-test, in contrast, resolves the 
conservative nature of the SH-test and thus is the preferred test. All tests performed 100,000 


resamplings using the RELL method. 
Clustering methods 


We evaluated multiple methods to summarize and visualize the binary encoded extant 
and ancestral reconstructed transcriptomes, including: 1) Logistic Principal Component Analysis 
(LPCA), a version of principal component analysis for dimensionality reduction of binary data 
(https://cran.r-project.org/web/packages/logisticPCA/vignettes/logisticPCA.html); 2) classical 
Multi-Dimensional Scaling (MDS); 3) Uniform Manifold Approximation and Projection (UMAP); 
4) tSNE; and 5) Fuzzy C-Means (FCM) clustering. All clustering analyses were conducted in R 
after removing columns (genes) with missing data (coded as ?/NA) or that were invariant (all 0 
or all 1). LPCA was performed using the logisticPCA R package (Landgraf and Lee, 2015), 
which implements three methods: exponential family PCA applied to Bernoulli data, logisitic 
PCA, and the convex relaxation of logistic PCA. For each of the methods, we fit the parameters 
assuming two-dimensional representation, returning four principal components (ks=4), and 
selecting the best m value to approximate the saturated model for cross validation. MDS was 
performed using the vegan R package (Oksanen et al., 2008) with four reduced dimensions. 
UMAP was performed using the umap R package. tSNE was performed using the Rtsne R 


package. 


To explore the data in greater detail we focused on FCM clustering because the results 
were qualitatively similar to the other methods, and it has several desirable properties including 
providing a statistically sound way to identify clusters rather than an ad hoc approach that might 
be applied to the other methods. FCM also allows each sample to have membership in multiple 
clusters and is conceptually similar to topic ("grade of membership") models used in population 


genetics to visualize private and shared genetic structure across populations. FCM membership 
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coefficients can thereby account for multiple sources of similarity including noise, phylogenetic 
signal, and convergence of gene expression. FCM was performed in R using the cluster R 
package, using Manhattan distances (cluster membership was not altered by using other 
distance metrics), and an estimated fuzzifier (m=1.034978). FCM clustering requires a priori 
knowledge of the number of clusters (K) to include, therefore we evaluated FCM with K=2-9 
following the suggestions given in https://www.r-bloggers.com/2019/01/10-tips-for-choosing-the- 
optimal-number-of-clusters/. First, we used the “elbow” method, in which the sum of squares of 
each cluster number is calculated and graphed and the optimal number of clusters estimated by 
a change of slope from steep to shallow (the elbow). We also assessed the optimal number of 
clusters using the clustree R package, which assess the optimal number of clusters by 
considering how samples change groupings as the number of clusters increases; clustree is 
useful for estimating which clusters are distinct and which are unstable but cannot determine the 


optimal number of clusters (K). 
CIBERSORT analyses 


We inferred the proportion of different cell-types in the endometrium during pregnancy 
across our comparative gene expression datasets using CIBERSORT (Newman et al., 2019), 
which takes as input a file of gene expression levels from a mixed cell population and a gene 
expression signature file with expression levels of marker genes in specific cell-types. 
CIBERSORT was run on the bulk RNA-Seq data from each species using a signature gene 
expression file based on the Vento-Tormo et al. scRNA-Seq dataset. The signature gene 
expression file included gene expression data (TPM-like) for genes with an expression level 
greater than or equal to the expression threshold that also have at least five-fold higher 


expression levels in a particular cell compared to all other cells (Jain and Tuteja, 2021). 
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908 Figure 1. Evolution and arrangement of the maternal-fetal interface and degree of 


909 __— placental invasion in viviparous Amniotes. 


910 (A) Phylogenetic relationships among amniote lineages, mammalian lineages labeled. Major 
911 evolutionary steps in the evolution of pregnancy are shown for Mammalia, Theria, and 
912 Eutheria. 

913 (B) Placenta classification based on the arraignment of the maternal-fetal interface and 
914 degree of invasiveness. Epitheliochorial placenta, in which placental invasion does not 
915 occur and the barrier between maternal blood and the chorion (tan cells) consists of 
916 maternal vascular endothelium (red cells) and uterine epithelium (blue cells). 
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Endotheliochorial placenta, in which the placenta invades through the uterine epithelium 
and the barrier between maternal blood and the chorion consists of the maternal 
vascular endothelium. Hemochorial placenta, in which the placenta invades through both 
the uterine epithelium and vascular endothelium, and maternal blood directly bathes the 
chorionic villi. Examples of species with each type of placenta are also shown. While the 
egg passes through the glandular uterine portion of the oviduct in oviparous species, the 
immunologically inert shell (black) prevents direct contact between maternal and fetal 


tissues. 
Figure 2. Binary encoding uncovers phylogenetic signal in transcriptome data. 


(A) Upper, Principal Component Analysis (PCA) of gene expression levels (TPMs) grouped 
species randomly, consistent with significant noise. Lower, scree plot showing percent of 
variance explained by dimensions 1-10. Note that the flat curve of variance explained 
indicates there are no significant dimensions to the PCA. 

(B) Upper, logistic PCA of the binary encoded endometrial transcriptome dataset groups 
species by phylogenetic relatedness, indicating significant noise reduction in the binary 
encoded dataset. Lower, scree plot showing percent of variance explained by 
dimensions 1-10. Note that the "elbow" of the graph is around dimension 4, suggesting 


that dimensions 1-4 of the PCA are significant. 


Figure 2—Source data file 1. Species names (common and binomial), genome annotations 


used for RNA-Seq analysis and parity mode. 
Figure 2—Source data file 2. Binary encoded endometrial transcriptome dataset. 


Figure 3. Maximum-likelihood (ML) phylogeny of binary encoded endometrial 


transcriptome data. 


(A) ML phylogeny of binary encoded endometrial transcriptome data inferred by IQ- 
TREE under the GTR2+FO+R3 model. Highly-supported branch splits, i.e., those 
with SH-aLRT 2 80%, LBoot 2 90%, aBayes 2 0.90, UFboot = 95%, StdBoot support 
2 80%, and parametric aLRT 2 0.95, are shown with black circles. Branch splits that 
are highly-supported by at least 4 of 6 methods are shown with gray circles. 
Particularly discordant phylogenetic relationships are shown in bold. Oviparous 
species are in black, viviparous species with epitheliochorial placentas in blue, 


endotheliochorial placentas in orange, and hemochorial placentas in red. Lizard 
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species are shown with genus names in italics. 

(B) Phylogenetic relationships of species in (A). Discordant phylogenetic relationships 
compared to the ML tree are shown in bold. 

(C) Tree topology tests comparing the maximum likelihood tree (GTR2+FO+H4) shown 
in panel A to alternate trees that correct the phylogenetic placement of specific 
species or the species tree shown in panel B. Delta Log L: log L difference from the 
maximal Log L in the set (A). bp-RELL: bootstrap proportion using RELL method. p- 
KH: p-value of one sided Kishino-Hasegawa test. p-SH: p-value of Shimodaira- 
Hasegawa test. p-WKH: p-value of weighted KH test. p-WSH: p-value of weighted 
SH test. c-ELW: Expected Likelihood Weight. p-AU: p-value of approximately 
unbiased (AU) test. + indicates tree is within the 95% confidence sets, — significant 
exclusion of tree from the 95% confidence sets. All tests performed 100000 


resamplings using the RELL method. 


Figure 3-figure supplement 1. ML phylogeny of binary encoded endometrial transcriptome 
data inferred by IQ-TREE under the best-fitting GTR2+FO+R3 model. Branch supports are 
shown for SH-aLRT (%) / LBoot (%) / aBayes / UFBoot (%) / StdBoot (%) / parametric aLRT. 
Clades are considered highly-supported if its SH-aLRT 2 80%, LBoot 2 90%, aBayes 2 0.90, 
UFboot 2 95%, StdBoot support 2 80%, and parametric aLRT 2 0.95. Particularly discordant 
phylogenetic relationships are shown in bold. Oviparous species are in black, viviparous species 
with epitheliochorial placentas in blue, endotheliochorial placentas in orange, and hemochorial 


placentas in red. Lizard species are shown with genus names in italics. 
Figure 4. Fuzzy C-Means clustering of extant and ancestral transcriptomes. 


(A) Species phylogeny with extant species and ancestral lineages colored according to 
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Fuzzy C-Means cluster membership (K=2-6) shown in (B). Note that ancestral 
nodes are numbered. Lizard species are shown with genus names in italics. 

(B) Fuzzy C-Means clustering of extant and ancestral transcriptomes (K=2-6). Extant 
species and ancestral lineages colored according to maximum degree of cluster 
membership shown in (C). Lizard species are shown with genus names in italics. 

(C) Fuzzy C-Means cluster membership of extant and ancestral transcriptomes (K=2-6). 
Degree of cluster membership is shown as a 100% stacked bar and colored 
according to proportion of membership in each cluster. Lizard species are shown 


with genus names. 


Figure 4—figure supplement 1. Exploratory clustering of extant and ancestral binary encoded 
transcriptomes, including: 1) Exponential Family Principal Component Analysis (PCA); 2) 
Logistic Principal Component Analysis (LPCA); 3) Uniform Manifold Approximation and 
Projection (UMAP); 4) t-distributed stochastic neighbor embedding (t-SNE); and 5) Hierarchical 


clustering (Manhattan distances). 
Figure 4-figure supplement 2. Exploratory analyses of Fuzzy C-Means cluster number. 


(A) FCM cluster number estimate using the elbow plot method with K=1—20, which was 
uninformative in choosing the optimal number of clusters. 

(B) Exploratory analyses of Fuzzy C-Means cluster number using the clustree method with 
K=1-11. In this figure the size of each node corresponds to the number of samples in 
each cluster, and the arrows colored according to the number of samples in each 
cluster. Transparent arrows (the incoming node proportion), show how samples from 
move between lower and higher cluster numbers and can be used as an indicator of 
cluster instability. In this graph samples switch clusters between K=1-5, are stable 
between K=6-9, and unstable with more than 9 clusters. These data indicate that 6 
clusters is optimal for resolution, i.e., more than 6 clusters no longer provides 


informative cluster memberships, while more than K29 is over-clustered. 
Figure 5. Convergent loss of RORA expression in species with epitheliochorial placentas. 


(A) Ancestral reconstruction of RORA expression. Pie charts at each internal node 
indicate the Bayesian Posterior Probability (BPP) that RORA is expressed (state 1, 
black) or not expressed (state 0, gray) at that node. Extant species are colored 
according to RORA expression (TPM22 = expressed, black; TPM<2 = not 
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expressed, gray); note that the RORA is not annotated in the dunnart genome and is 
therefore indicated as “?”. Degree of placental invasiveness is shown for extant 
species: Hemochorial (red), endotheliochorial (orange), epitheliochorial (blue), 
oviparous (black). Inference of an endometrial population of ILC2/3 cells from 
deconvolution of bulk RNA-Seq datasets from each species is shown for extant 
species: Species with ILC2/3 cells and RORA expression (blue dots), species 
without ILC2/3 cells and RORA expression (yellow dots), and species without ILC2/3 


cells and without RORA expression (white). 


(B) RORA expression in single-cell RNA-Seq data from cell-types at the first trimester 


human maternal-fetal interface. Expression data is shown as a violin plot of log 
transformed counts for each cell-type. Cell-types are color coded by their location at 
the maternal-fetal interface: Blood (grey), decidua (pink), placenta (embryonic 


chorion, green). 


Figure 5 — source data 1. Nexus file of the binary encoded endometrial transcriptome dataset 


with gene names and species phylogeny. 
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